table representation
Texts or Images? A Fine-grained Analysis on the Effectiveness of Input Representations and Models for Table Question Answering
Zhou, Wei, Mesgar, Mohsen, Adel, Heike, Friedrich, Annemarie
In table question answering (TQA), tables are encoded as either texts or images. Prior work suggests that passing images of tables to multi-modal large language models (MLLMs) performs comparably to or even better than using textual input with large language models (LLMs). However, the lack of controlled setups limits fine-grained distinctions between these approaches. In this paper, we conduct the first controlled study on the effectiveness of several combinations of table representations and models from two perspectives: question complexity and table size. We build a new benchmark based on existing TQA datasets. In a systematic analysis of seven pairs of MLLMs and LLMs, we find that the best combination of table representation and model varies across setups. We propose FRES, a method selecting table representations dynamically, and observe a 10% average performance improvement compared to using both representations indiscriminately.
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > Singapore (0.04)
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- (6 more...)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.48)
HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization
Liu, Zhenghao, Wang, Haolan, Li, Xinze, Xiong, Qiushi, Yang, Xiaocui, Gu, Yu, Yan, Yukun, Shi, Qi, Li, Fangfang, Yu, Ge, Sun, Maosong
Tabular data contains rich structural semantics and plays a crucial role in organizing and manipulating information. To better capture these structural semantics, this paper introduces the HybrId-modal Preference oPtimizatiOn (HIPPO) model, which represents tables using both text and image, and optimizes MLLMs to effectively learn more comprehensive table information from these multiple modalities. Specifically, HIPPO samples model responses from hybrid-modal table representations and designs a modality-consistent sampling strategy to enhance response diversity and mitigate modality bias during DPO training. Experimental results on table question answering and table fact verification tasks demonstrate the effectiveness of HIPPO, achieving a 4% improvement over various table reasoning models. Further analysis reveals that HIPPO not only enhances reasoning abilities based on unimodal table representations but also facilitates the extraction of crucial and distinct semantics from different modal representations. All data and codes are available at https://github.com/NEUIR/HIPPO.
- North America > United States (0.04)
- North America > Jamaica (0.04)
- North America > Costa Rica (0.04)
- (3 more...)
Evaluation of Table Representations to Answer Questions from Tables in Documents : A Case Study using 3GPP Specifications
Roychowdhury, Sujoy, Soman, Sumit, Ranjani, HG, Sharma, Avantika, Gunda, Neeraj, Bala, Sai Krishna
With the ubiquitous use of document corpora for question answering, one important aspect which is especially relevant for technical documents is the ability to extract information from tables which are interspersed with text. The major challenge in this is that unlike free-flow text or isolated set of tables, the representation of a table in terms of what is a relevant chunk is not obvious. We conduct a series of experiments examining various representations of tabular data interspersed with text to understand the relative benefits of different representations. We choose a corpus of $3^{rd}$ Generation Partnership Project (3GPP) documents since they are heavily interspersed with tables. We create expert curated dataset of question answers to evaluate our approach. We conclude that row level representations with corresponding table header information being included in every cell improves the performance of the retrieval, thus leveraging the structural information present in the tabular data.
Tabular Data Augmentation for Machine Learning: Progress and Prospects of Embracing Generative AI
Cui, Lingxi, Li, Huan, Chen, Ke, Shou, Lidan, Chen, Gang
Machine learning (ML) on tabular data is ubiquitous, yet obtaining abundant high-quality tabular data for model training remains a significant obstacle. Numerous works have focused on tabular data augmentation (TDA) to enhance the original table with additional data, thereby improving downstream ML tasks. Recently, there has been a growing interest in leveraging the capabilities of generative AI for TDA. Therefore, we believe it is time to provide a comprehensive review of the progress and future prospects of TDA, with a particular emphasis on the trending generative AI. Specifically, we present an architectural view of the TDA pipeline, comprising three main procedures: pre-augmentation, augmentation, and post-augmentation. Pre-augmentation encompasses preparation tasks that facilitate subsequent TDA, including error handling, table annotation, table simplification, table representation, table indexing, table navigation, schema matching, and entity matching. Augmentation systematically analyzes current TDA methods, categorized into retrieval-based methods, which retrieve external data, and generation-based methods, which generate synthetic data. We further subdivide these methods based on the granularity of the augmentation process at the row, column, cell, and table levels. Post-augmentation focuses on the datasets, evaluation and optimization aspects of TDA. We also summarize current trends and future directions for TDA, highlighting promising opportunities in the era of generative AI. In addition, the accompanying papers and related resources are continuously updated and maintained in the GitHub repository at https://github.com/SuDIS-ZJU/awesome-tabular-data-augmentation to reflect ongoing advancements in the field.
RTF: Region-based Table Filling Method for Relational Triple Extraction
An, Ning, Hei, Lei, Jiang, Yong, Meng, Weiping, Hu, Jingjing, Huang, Boran, Ren, Feiliang
Relational triple extraction is crucial work for the automatic construction of knowledge graphs. Existing methods only construct shallow representations from a token or token pair-level. However, previous works ignore local spatial dependencies of relational triples, resulting in a weakness of entity pair boundary detection. To tackle this problem, we propose a novel Region-based Table Filling method (RTF). We devise a novel region-based tagging scheme and bi-directional decoding strategy, which regard each relational triple as a region on the relation-specific table, and identifies triples by determining two endpoints of each region. We also introduce convolution to construct region-level table representations from a spatial perspective which makes triples easier to be captured. In addition, we share partial tagging scores among different relations to improve learning efficiency of relation classifier. Experimental results show that our method achieves state-of-the-art with better generalization capability on three variants of two widely used benchmark datasets.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Illinois (0.08)
- (22 more...)
Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs
Deng, Naihao, Sun, Zhenjie, He, Ruiqi, Sikka, Aman, Chen, Yulong, Ma, Lin, Zhang, Yue, Mihalcea, Rada
Specifically, we investigate Recent years have witnessed an explosion of Large several research questions, including the effectiveness Language Models (LLMs), with impressive performance of image-based representation of tabular on various Natural Language Processing data and how different text-based or imagebased (NLP) tasks (Brown et al., 2020; Touvron et al., prompt methods affect LLMs' performance 2023; Team et al., 2023). Research to date has on table-related tasks. In addition, we provide analysis examined the performance of LLMs for various and hypothesis of LLMs' behaviors. Our findings aspects and abilities (Bang et al., 2023b; Bubeck include: et al., 2023; Akter et al., 2023), but their effectiveness on structured data such as tables is less explored. LLMs maintain decent performance when we Unlike unstructured text, tables are systematically use image-based table representations. Sometimes, organized structures of a large amount of image-based table representations can information. This characteristic makes tabular make LLMs perform better.
- North America > United States > Ohio (0.05)
- South America > Brazil (0.04)
- North America > Dominican Republic (0.04)
- (12 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.66)
- Education (0.67)
- Consumer Products & Services (0.46)
Polynomial-based Self-Attention for Table Representation learning
Kim, Jayoung, Shin, Yehjin, Choi, Jeongwhan, Wi, Hyowon, Park, Noseong
Structured data, which constitutes a significant portion of existing data types, has been a long-standing research topic in the field of machine learning. Various representation learning methods for tabular data have been proposed, ranging from encoder-decoder structures to Transformers. Among these, Transformer-based methods have achieved state-of-the-art performance not only in tabular data but also in various other fields, including computer vision and natural language processing. However, recent studies have revealed that self-attention, a key component of Transformers, can lead to an oversmoothing issue. We show that Transformers for tabular data also face this problem, and to address the problem, we propose a novel matrix polynomial-based self-attention layer as a substitute for the original self-attention layer, which enhances model scalability. In our experiments with three representative table learning models equipped with our proposed layer, we illustrate that the layer effectively mitigates the oversmoothing problem and enhances the representation performance of the existing methods, outperforming the state-of-the-art table representation methods. However, recent studies have raised concerns about the potential limitations of self-attention, a fundamental component of Transformers, specifically an issue of oversmoothing (Dong et al., 2021; Wang et al., 2022; Guo et al., 2023; Xue et al., 2023). Gong et al. (2021); Zhou et al. (2021) has highlighted that at deeper layers of the Transformer architecture, all token representations tend to become nearly identical (Brunner et al., 2019). The problem poses challenges when it comes to expanding the scale of training Transformers, especially in terms of depth, since Transformers rely on a simple weighted average aggregation method for value vectors. In our preliminary experiments, we observe that Transformers designed for tabular data also exhibit the oversmoothing issue, as illustrated in Fig.1.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Virginia (0.04)
- Asia > Taiwan (0.04)
- Asia > China (0.04)
Enhancing Open-Domain Table Question Answering via Syntax- and Structure-aware Dense Retrieval
Jin, Nengzheng, Li, Dongfang, Chen, Junying, Siebert, Joanna, Chen, Qingcai
Open-domain table question answering aims to provide answers to a question by retrieving and extracting information from a large collection of tables. Existing studies of open-domain table QA either directly adopt text retrieval methods or consider the table structure only in the encoding layer for table retrieval, which may cause syntactical and structural information loss during table scoring. To address this issue, we propose a syntax- and structure-aware retrieval method for the open-domain table QA task. It provides syntactical representations for the question and uses the structural header and value representations for the tables to avoid the loss of fine-grained syntactical and structural information. Then, a syntactical-to-structural aggregator is used to obtain the matching score between the question and a candidate table by mimicking the human retrieval process. Experimental results show that our method achieves the state-of-the-art on the NQ-tables dataset and overwhelms strong baselines on a newly curated open-domain Text-to-SQL dataset.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Dominican Republic (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)
- (14 more...)
RoTaR: Efficient Row-Based Table Representation Learning via Teacher-Student Training
Chen, Zui, Cao, Lei, Madden, Sam
We propose RoTaR, a row-based table representation learning method, to address the efficiency and scalability issues faced by existing table representation learning methods. The key idea of RoTaR is to generate query-agnostic row representations that could be re-used via query-specific aggregation. In addition to the row-based architecture, we introduce several techniques: cell-aware position embedding, teacher-student training paradigm, and selective backward to improve the performance of RoTaR model.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- North America > United States > Arizona (0.04)
- Asia > China > Beijing > Beijing (0.04)
Scalable and Interpretable Data Representation for High-Dimensional, Complex Data
Kim, Been (Massachusetts Institute of Technology) | Patel, Kayur (Google) | Rostamizadeh, Afshin (Google) | Shah, Julie (Massachusetts Institute of Technology)
The majority of machine learning research has been focused on building models and inference techniques with sound mathematical properties and cutting edge performance. Little attention has been devoted to the development of data representation that can be used to improve a user's ability to interpret the data and machine learning models to solve real-world problems. In this paper, we quantitatively and qualitatively evaluate an efficient, accurate and scalable feature-compression method using latent Dirichlet allocation for discrete data. This representation can effectively communicate the characteristics of high-dimensional, complex data points. We show that the improvement of a user's interpretability through the use of a topic modeling-based compression technique is statistically significant, according to a number of metrics, when compared with other representations. Also, we find that this representation is scalable --- it maintains alignment with human classification accuracy as an increasing number of data points are shown. In addition, the learned topic layer can semantically deliver meaningful information to users that could potentially aid human reasoning about data characteristics in connection with compressed topic space.
- Asia > Middle East > Jordan (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.35)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.35)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)